In order to solve the multi - equilibria problem in the stochastic games , a macrl algorithm called macrl - japs is proposed . these two learning methods have been justified by experiments . the main research achievements and innovations are the establishment of two macrl methods for pursuit game , which are justified by experiments 針對(duì)聯(lián)合行為學(xué)習(xí)者,給出了多agent協(xié)同強(qiáng)化學(xué)習(xí)的團(tuán)隊(duì)隨機(jī)博弈框架,并解決了多最優(yōu)均衡解問(wèn)題,提出了基于聯(lián)合行為優(yōu)先序列的多agent協(xié)同強(qiáng)化學(xué)習(xí)方法macrl - japs 。